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ORTHONORMAL INTEGRATORS BASED ON HOUSEHOLDER 
AND GIVENS TRANSFORMATIONS* 

LUCA DIEClt AND ERIK S. VAN VLECK* 



Abstract. We carry further our work [DV2] on orthonormal integrators based on Householder 
and Givens transformations. We propose new algorithms and pay particular attention to appropriate 
implementation of these techniques. We also present a suite of Fortran codes and provide numerical 
testing to show the efficiency and accuracy of our techniques. 
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1. Introduction. In recent times, there has been a lot of interest in techniques 
for solving differential equations whose exact solution is an orthonormal matrix, in 

|pRVl| 



such a way that the computed solution is also orthonormal; [CIZ 



[DV1] 



[DV2], [DLP|, JGSO |, JH]|, [ MR |, are a representative sample of references. In this 



paper, we will restrict attention to a particular class of differential equations with 
orthonormal solution, the QR equations. These arise in many seemingly unrelated 
applications (e.g., see pGGSj ], Q, jDNTj, pV2[ , Q), but can all be tracked back 
to the following setup. 

1. We are given the function A: t — > R™ xn , A G C fc_1 , k > 1, and consider the 
associated initial value problem for X e H™ xp , p < n: 

X = A(t)X , te [t ,t f ] , X{t )=X Q full rank. 

2. We need the QR factorization of the matrix X: X = QR, Q,R € C k , Q € 
H nxp , R 6 H pxp . For stability reasons, we want to find the factors Q and R without 
first finding X. Once Q is known, the equation for R becomes: 



(1.1) 



R = AR , R(t )=Ro, A(t) = (Q T AQ - Q T Q)R, 



where -since X — QR- A of (1.1) is upper triangular. Computation of R reduces 
to a backward substitution algorithm for the entries of R, coupled with quadratures, 
which is conceptually simple. Therefore, in this work we will focus on finding Q. 
3. Let QoRo = Xq be a QR factorization of Xq. Differentiate the relation X = QR, 
make use of triangularity of R, let S = Q T Q, observe that S E H pxp must be skew- 
symmetric, and obtain the following differential equation for Q: 



(1.2) 



AQ-QQ 1 AQ + QS , S t . 



{Q T AQ)ij, i>j 

0, i = j 

-{Q T AQ) Jl , i<j 



Q(t ) = Qo 
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Remark 1.1. Different rewritings of (|l.2|) are possible, in the case of p < n, which 



present distinct advantages with respect to the given form (1.2). In particular, the 



following rewriting proposed by Bridges and Reich in [BR| is interesting 



(1.3) Q = [(A- QQ T A) - (A T - A T QQ T ) + QSQ T ]Q . 

Remark 1.2. Triangularity of A is the key point of a successful and widely used 
technique for computing Lyapunov exponents of the A-system in decreasing order, 



i.e., the first p dominant ones; e.g., see [ BGGS | and | DRV2 1 . It is worth pointing out 



that one may not know a priori how many Lyapunov exponents are needed, i.e., the 
value of p; for example, one may need to compute all the positive onesQ In this case, 
it would be desirable to have techniques which are capable (at least, in principle) of 
increasing the number of exponents to be calculated without having to restart the 
entire computation from scratch, and that can profitably use the computations done 
so far. 

In the next section, we discuss the design choices we face when devising methods to 



solve (1.2). Then, we revisit the setup we proposed in DV2] where we integrate (1.2) 
by seeking Q as product of elementary Householder or Givens transformations. In 
Section || we discuss numerical issues which must be confronted when implementing 
techniques based on these elementary transformations. We also put forward new 
formulations in the Householder case. In Section^, we present FORTRAN codes we have 
written for the task, and we illustrate their performance on a number of examples. 
Conclusions are in Section ||. 

We must point out right away that in this work we are exclusively concerned 
with the case of coefficient matrices A(-) which are dense; that is, they have no 
particularly exploitable structure. In case A is structured (symmetric, banded, etc.) 
some computational savings ought to be possible, and we will consider these cases in 
future work. 

2. B ackground and Householder and Givens representations. Integrat- 



ing (L2) is an example of integration on a manifold. Quite clearly, we can view 
the solution Q as a curve on the compact manifold of orthonormal (orthogonal if 
p = n) matrices. The dimension of this manifold is p n ~%~ > which is therefore 
the number of degrees of freedom one has to resolve. In general, a smooth mani- 
fold may be parametrized in many different ways, and the choice of parametrization 
may turn out to be of utmost importance from the numerical point of view. Clearly, 
a minimal parametrization requires p 2 ""^" 1 parameters. We notice that a typical 
construction in differential geometry is to parametrize a smooth manifold by overlap- 
ping local charts; this may be a convenient way to seek the solution of a differential 
equation evolving on a manifold even when the manifold itself may happen to be 
globally parametrizable. Indeed, from the numerical point of view, it is important 
that the parametrization lead to a stable numerical procedure. For example, it is 
not enough to know that the solution of a differential equation on a manifold has a 
global parametrization to make adopting this parametrization numerically legitimate. 
In particular, if we choose a representation for the solution Q of (|l.2[), even when it 



happens to be globally defined, it may not give us a sound numerical procedure. 



Since the solution of (1.2) exists for all times, one may think that direct integration 



of the IVP for Q is the right way to proceed. However, a standard numerical method 



Lyapunov dimension and physical measure, see [ER 
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used on (1.2) will not deliver a solution which is orthonormal at the meshpoints. For 
this reason, several techniques have been proposed to obtain an orthonormal numerical 
solution. 

[p = n ] In this so-called "square case", there are many competing approaches. In 
our opinion, the ones below are the most natural and/or interesting. All 
of these choices have been implemented to varying degree of sophistication, 
and all present intrinsic implementation difficulties: without entering in the 
specifics of these methods, it is enough to state that none of the methods 
below is trivial to implement, with the exception of (ii), and that all them 
can be implemented so to have an expense proportional to 0(n 3 ) per step, 
which is the expense of evaluating the right hand side of (1.2)p|. 

(i) Rungc-Kutta schemes at Gauss points used directly on ([1T4); [DRV1]. 

(ii) Projected methods. There are at least two of these: (a) bad and (b) 
good. 

(a) These methods use the original differential equation for X, integrate 
this, and then form its QR factorization at each step. They are nu- 
merically unsound, since integration of X is quite often an unstable 
procedure (as when the problem is exponentially dichotomic). 

(b) These consist in using any scheme to integrate ( |1.2| ), and then pro- 
jecting the solution onto the manifold of the orthonormal matrices. 
For example, the pro jectio n step c an be done with a modified Gram- 
Schmidt procedure (as in [ DRV1 1 ) or by using the orthogonal polar 
factor (as in |h]]). 

(hi) Transformation methods. Here, one transforms the equation for Q, 
which is an element of the (Lie) group of the orthogonal matrices, into 
an equation for a skew-symmetric matrix Y, an element of the underly- 
ing (Lie) algebra. The equation for Y now evolves in a linear space, and 
standard discretizations will deliver a skew-symmetric approximation: 
transforming this back to the group gives the desired approximation for 
Q. There have been at least two ways in which this design has been 
carried out: 

(a) using near the identity transformations and the matrix exponential 
to get back on the group (e.g., see |Mu| ), and 

(b) using the Cayley transform to get to the algebra and then back to 
the group (e.g., see [ DLP[ ]). 

\p < n ] Here, the solution belongs to the manifold of rectangular orthonormal ma- 
trices, the so-called Stiefel manifold. We must appreciate that often one has 
p <C n, and thus we must insist that a well conceived method should have 
cost proportional to 0(n 2 p) per step, that is the cost of evaluating the right 
hand side of (p_.2j). As it turns out, and to the best of our understanding, 
this restriction rules out most of the methods we listed above for the p — n 
case, with the exception of projection methods. However, the reasons why 
the other methods must be discar ded are different from one another: Gauss 
RK methods ap plied directly to (L2) do not maintain orthogonality in the 
case p < n (see | DVl[| ), and transformation methods require going from the 
group structure to the algebra and back: apparently, this r equ ires an 0(n 3 ) 
expense at some level. Notice that Gauss RK schemes on (|l.3[) do maintain 



2 we ignore the expense of computing A(t), since this is problem dependent, and we only focus 
on the linear algebra expense; further, recall that we only consider the case of A "dense" 
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orthogonality, but it is not clear to us that they can be implemented so to 
require a 0(n 2 p) expense per step without forcing a possibly severe stepsize 



restriction; for this reason in | BR | the authors adopt a stabilization procedure 



which allows for explicit schemes to be used on (1.3); in essence, their method 
becomes akin to inexact projection methods. 

Based upon the above discussion, and with our present knowledge and under- 
standing, it would seem that projection methods are the only survivors amongst the 



methods recalled above. We will see below that the techniques put forward in DV2] 
and revisited in this paper are a better alternative, but before doing so, let us point 
out some inherent characteristics and potential limitations of projection methods, 
(i) Perhaps the major obstacle to use the (good) projection methods is of the- 
oretical nature: projection methods are hard to analyze. Of course, if any 



method of order s, say, is used for (1.2), then the projected solution is also of 



order s. However, the projection step itself is a discontinuous operator, and 
this has been cause for some worries. 

(ii) Part of the worries are probably caused by the difficulty of getting a backward 
error statement for projection methods. In the numerical analysis of differen- 
tial equations, a backward error analysis consists in the realization that one 
has solved (at any given order) a problem which is 0(h s ) close to the original 
one. We refer to |HL| for this point of view, which has proven valuable for 
many problems, and most notably for Hamiltonian systems. However, this 
point of view is perhaps less relevant in the context under examination here. 
After all, suppose we obtain some orthonormal Q at the grid points tk in- 
stead of the exact Q, and some R, instead of the exact R. Naturally, we can 
assume that both Q and R are 0(h s ) approximation to the exact matrices. 
Therefore, we have triangularized some Y(tk) instead of Y(tk), and obviously 
Y(tk) is 0(h s ) close to Y(tk). In general, should we expect something better? 

(iii) In adaptive integration mode, there are some undesirable features of projec- 
tion methods. For the sake of clarity, suppose that the time stepping strategy 
is based on two formulas of different orders. If we control the error on the 
unprojected solution, then it may occasionally happen that we will reject a 
step which would not have been rejected if we had checked the error on the 
projected values (or, similarly, we may accept a step which would have not 
been accepted for the projected values). On the other hand, if we control the 
error on the projected values, then we essentially increase the work, because 
now two projections have to be performed. Finally, and regardless of how we 
control the error, it is undesirable that when we end up rejecting a step we 
had to approximate all of Q in the first place. It would have been preferable 
to realize that the step was not going to be successful ahead of computing all 
of Q. Admittedly, this may sound as a strange request, but we'll see below 
that it is possible to achieve it with well designed methods. 

(iv) To conclude, and in spite of the above observations, we must say that our 
experience with projection methods has been quite positive. For this reason, 
in Section ||[ we will compare performance of our new codes with a projection 
method. 

We are now ready to list which properties -in our opinion- a method to solve 



( |l.2|) should have. At this point, the word "method" must be read as: "formulation 
of the task in a form which in principle allows for the properties below to be satisfied" . 
In so doing, we maintain the freedom of delaying discussion of which formulas will be 
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used in practice. 

1. The method must deliver an orthonormal approximation, at the very least 
at the mesh points. If a RK type integrator is adopted, the method must 
deliver (at no added cost) an orthonormal approximation also at the internal 
abscissas. This will guarantee that forming the matrix A at the internal 
abscissas is a numerically stable procedure. 

2. The method must be flexible enough to handle without modifications both 
cases p = n and p < n. 

3. The method must have a cost per step of 0(n 2 p), and never require a 0(n 3 ) 
expense when p < n. 

4. The method must be based on numerically sound coordinates. 

5. The method must allow for integration of the relevant differential equations 
with theoretical order restrictions given only by the degree of differentiability 
of Q. 

6. The method must be well suited also in adaptive step-size mode: (i) we want 
to be able to increase the stepsize if Q evolves slowly, (ii) we want to be 
able to reject a step which is going to be unsuccessful as quickly as possible, 
possibly (much) earlier than having completed approximation for all of Q. 

7. The method must be flexible enough to allow for increasing the number of 
columns of A which we want to triangularize, without having to restart the 
entire triangularization process from the beginning, and it should be powerful 
enough to exploit the work already done. 



As we will see, the methods we laid down in [ DV2 achieve the above points. These 
methods are based upon writing Q as product of Householder or Givens transforma- 
tions, and since, in general, these elementary transformations do not vary smoothly 
on the whole interval of interest, we have to adopt a "reimbedding" strategy in order 
to obtain a well defined process. We refer t o |DV2 | for the original derivation of this 
approach, which we will review in Sections 2.1 and [2.2] . 

Now we consider a different interpretation of these methods based on Householder 
or Givens transformations. In what follows, we will assume that we have to integrate 
(1.2) when p < n; trivial simplifications take place if p = n. 

We begin observing that, in principle, the sought solution Q can always be written 



in the redundant way Q(t) — [Q(t) Q ± (t) ] 



U(t) 



orthogonal complement of Q. Thus, we can think 
following representation for the sought solution Q: 



Vt, where Q is the 



-at least in theory- to have the 



(2.1) 



Q(t) = U(t k ,to)Q(t,t k ) , t>t k , Q(t k ,t k ) = 







In Q, U(t k ,t ) E R nxn and Q(t,t k ) € H nxp . The matrix U(t k ,t ) is at once 
comprising Q, at t k , and its orthogonal complement. Now, let U k :— U(t k ,to), and 
set 



A(t) = UlA{t)U k ,t>t k . 



With this, we have that the equation satisfied by Q(t, t k ) is again (1.2) with A replac- 
ing A there. With abuse of notation, if we still call Q(t) what is really Q(t,t k ), for 
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t > tk, we would then have 

(Q T AQ) tJ , i>j 

(2.2)Q = AQ- QQ T AQ + QS , S l3 = { 0, i = j , Q(t k ) = 

'{Q T AQ)ji, i<j 



(The equation for R, (1.1), is also modified trivially with A replacing A.) 



Thus, quite clearly, knowledge of Uk would allow us to integrate the equation ( 2.2 ) , 
starting "near the identity" ; ostensibly, this can be done with a number of choices 
for representing Q(t, tk) valid locall y, i f not globally (recall our discussion on local 
charts). Now, suppose we can solve ( |2.2| ) from t k to t^+t, obtaining Q(tk+i), and that 
at the same time we are able to obtain also the orthogonal complement of Q{tk+\, tk), 
Q x (tk+x,tk); then, we could form the matrix Uk+i = [Q{tk+i) Q ± (tk+l)} Uk, rede- 
fine A and Q(t,tk+i) accordingly for t > tk+i and then again formulate a differential 



equation for Q(ttk+i) identical to (2.2) for t > tk+i- We could then repeat this basic 



setup until we arrive at tt. Proceeding this way, we would always have to solve differ- 



ential equations with initial conditions 



Naturally, in practice we will only have 





computed approximations, and not exact values, but the setup remains unchanged. 

Notice that, unless we explicitly compute also Q 1 - so to triangularize all of A (and 
this would cost 0(n 3 )), the transformed matrix A, at the t k s, is upper triangular only 
in its left (n,p) part, in particular its bottom (n-p,n-p) block is not triangular. But, 
suppose that we now decide that we really needed to triangularize more columns of 
the fundamental matrix solution X (or, which is the same, wanted to transform larger 
part of A to upper triangular) . If, somehow, we managed to keep track not only of 
Q(t), but also of its orthogonal complement, then we could work only on the bottom 
(n-p,n-p) piece of the matrix A. In other words, at the price of added memory of 
course, we would avoid having to restart from scratch. But, is it possible to obtain at 
once information on Q(tk+i,tk) and Q (t/.+i, t/.), for a cost proportional to 0(n 2 p) 
per step? This is where we can in principle exploit the representation for Q(t,tk) in 
terms of either Householder or Givens transformations. In fact, with these choices, 
one does get information on both Q(t, tk) and its orthogonal complement at the same 
time, at the expense of computing only Q{tk+\, tk)- 

Warning. It should be stressed right away that Householder (or Givens) transforma- 
tions are not globally defined; this means that if we want to represent Q(t) in fl2.1| ) by 
using these transformations, in general we cannot expect to have a globally smooth 



representation. As we showed in [DV2|, it is a trivial matter to recover for free a 
smooth representation for Q(t). However, it is not trivial at all to recover for free a 
smooth representation for Q 1 - (t) (see also [OS]). After all, it is hardly imaginable that 



one can obtain a smooth representation for the orthogonal complement at the price 
of only getting Ql Regardless, lack of smoothness in the obtainable representation 
of Q is not a concern in the contexts of which we are aware. For example, if the 
orthogonal complement is needed because we intend to carry further the triangular- 
ization process of A, then this can be done without restarting the entire computation 
from the beginning (but it is not a trivial implementation to do, since we need to save 
in memory all quantities which have been computed thus far). 

Now, since Householder and Givens transformations are not unique (there is a 
sign ambiguity for each Householder reflection, and a decision to be made on the 
order in which we apply the Givens rotations) , we must specify the way in which this 
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lack of uniqueness is resolved. This extra freedom means that a choice must be made 
between different coordinates systems to represent the same object. The best way to 
resolve this ambiguity is to make sure that we pick up the soundest coordinates from 
the numerical point of view. 

These ideas have been carried out (in a different notation) in DV2|. There, 



we wrote the solution Q(t) of (f.2), locally, as product of elementary Householder 
or Givens transformations. We are ready to review the basic algorithms based on 
Householder and Givens transformations. We will not explicitly take advantage of the 
representation ( |2.f| ), but, by writing Q(t) solution of (f.2) as product of elementary 
orthogonal matrices of Householder or Givens type, we are conceptually representing 
Q as in (|2.1|), with nonsmooth U. 



2.1. Householder transformations. Suppose we are at t k and that we know 
X(t k ) (e.g., t k = t a ). Then, to find Q{t) such that Q T {t)X(t) = R{t), for t > t k , we 
look for Q T (t) = P p (t) ■ ■ ■ Pi(i), with P^t) = Pf(t) the Householder matrices 



Pi(t) = 



h-i 






h(t) 



Qi(t) = I- 



uj{t)ui{i) 



After (i — f) transformations, the matrix X got transformed into Pi—i . . .P\X and 
its first (i — 1) columns have been triangularized. Let us still call X the transformed 
matrix, and let Xi — X{i : n,i) be its i-th column we need to triangularize. This is 
the role of Pi. So, we will set 

(2.3) Ui(i) = Xi(t) - <Ji\\xi\\ei , 

and continue the triangularization process. The standard textbook choice for the 



value of Oi is (see [GVLQ: 
(2.4) 



-1, if e^(ifc) > 0, 
1, if efxiltk) < 0. 



This is the idea. But, of course, we do not have X except at £o- hi | DV2 | we 
showed that differential equations can be set for the Mi's directly, and for the norm 
of the vectors Xi, requiring only knowledge of the coefficient matrix A. Moreover, 
we also derived differential equations for the Householder transformations in different 
variables, namely for 



(2.5) 



INI ' 



Qi= I - 2viVi 



which are better scaled variables, since vfvi = 1. 

We now recall the differential equations for the u and w-variables. For simplicity, 
we omit the subindices, and thus use the notation u for Ui, etc., and also use A for 
A(i : n,i : n), where the matrix A has been progressively modified by the accumulated 
transformations : 



(2.6) (A,P 3 ) -update : A(t) := P 3 {t)A{t)P 3 {t) - P 3 {t)P 3 (t) , j = f, 
With this understanding, for the u-variables we have: 



t 



u 




' A-2e x e\A a (A - e\e\ A s )e{ 




u 


u T A s u 


ei 


a\\x\\_ 




2e\A s ejA sei 




_cr||x||_ 


<t\\x\\ 


-f 
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Here , A s = 1/2(A + A T ), ej is the first unit vector (of appropriate dimension), and 
(|2.7|) must be understood as a differential equation for the itj 



For the w-variables, we have the following. Partition v a,s v — 
be the first column of A, (0, aj a ) be the first row of ^(A — A T ), 



i = 


1, . . . ,p. 






"(eft;)- 


, let 


an 




V 


Si 



an 

Ol,s 



be the 



first column of A s , and let A and be the submatrices obtained from A and A s , 
respectively, by deleting the first row and column. Then, we have 



(2.8) 



where we have set 



b = 



2{elvf 



d 


■(eft,)" 




' c T -b T ~ 






dt 


V 




b-c S-S T _ 




V 


Av{ejv) , 


c T :-- 


= {-a{ a v + a)v T , 





2{ejv) 

and a := (an (eft;) + of s «)(2(ef v) 2 - 1 ) + 2(ef v)?) T (a M (ef u) + A s 0) . 



-ai + J 4f))u T , 



The differential equations (2/7) and ( |2.8| ) can easily be supplied with initial con- 
ditions at to since Xo is known, and we can find Qo via Householder transformations 
(in either u or v formulation, with the cr's satisfying (2.4)). However, to describe 
the typical step between tk and t^+i = tk + hk, and regardless of the choice adopted 
between u or v-variables, the expression (^J|) used for choosing the sign of a must 
be modified since in practice we do not have X(tk)- The way it is done is to enforce 
(2.4), but by only keeping track of the transformations, ft goes as follows. 

Suppose we have found the Householder matrices at tk, coming from tk—i, call 

'Ii-i 

. ■ Q { t ] ^). 

dition for the Householder matrices (that is, different it's or v's and cr's) we need in 



them P- 



(k-i) 



(t k ). CeRP$ k \t k ) = 



the possibly different initial con- 



order to step past tk ■ Let Kq 
(2.9) 



(k) 



I n . Inductively define a\ 



(fc) 



1. 



,p, as follows: 



:= 



-1, ifaf^ef^Qf-^O, 
-1, iiaf~^ejK\ k J 1 Qf- 1 h 1 <Q : 



where the matrices K^_ x G IR 



> 71 — 2+1,71 — 2+1 



,i = 2, 



, p, are defined by 



and, for i = 1, . . . ,p, the matrix {tk) is obtained so that 



(k-i)(k) 



K. 



(fe) 



Thus, we need to find a^| Householder transformation which transforms the left-hand 
side of this last equation into cj fc 'ei. This trivially gives new ICs for the v[ (tk); the 
sign ambiguity in the vector v\ (tk) 1S resolved by forcing the sign to that of the first 



3 not unique, there is typo in [DV2 
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component of (t&). Thus, we can prescribe new ICs for the vl K, (t k ), and from 

these it is simple to write ICs for the u^s: 

uf\t k )^^2a\ k \\xS k )\\{e T l vt\t k ))v^\t k ). 

The proof of the following Lemma is omitted since it amounts to a simple verifi- 
cation. 



(fc), 



Lemma 2.1. The choice (2.i) is the same as (2.4) 



Remark 2.1. Let us substantiate the claim that the choice (2.4) ensures that 



we are using sound coordinates from a numerical point of view. In linear algebra, 
see [GVL , the choice of signs as in (2.4) is justified in order to avoid subtraction of 



(possibly) nearly equal numbers. Of course, this is still true in our context, since we 
will need to form the reflectors. But there is also another aspect to take into account 
in the present context. We restrict to the ^-coordinates, since these must be generally 
preferred to the u-coordinates (but see the ^-coordinates of (3T) in Section ||). We 
seek an orthonormal function Q, and we are choosing local coordinates to represent it. 
Obviously, every column of Q, Qei, has unit length, and we represent these columns 
as Pi.. .P\e j\ t o do this, we find the Vi's, each of wh ich has itself unit length, by 
integrating (2_J). From the differential equations ( |2.8|) , we observe that there is a 



division by efvi in forming the vector 



2(e^,) 2 



—v of S (here, e\ = (1,0, 



,0f G 



Ji n ~ l+1 ). For stability, we must ensure to avoid division by small numbers. But, with 
some simple algebra, one can see that: 
The choice (2.9) is equivalent to having 



(2.10) 



n— i+1 
J=2 



In particular, the vector ^^gr^ — -v is as well scaled as generally possible. Moreover, 



(2.10) can be used to decide if the current Householder frames are numerically stable 
or not. 



In the u-variables, it is easier to check (2.4) directly, since ejxi = efui + <7j||:Ej||; 
thus, as long as 



(2.11) 



<Ji(e[ui + (Ti\\xi\\) < , i=l,...,p, 



there is no need to change the current frame. 



We now summarize the overall strategy on a typical step from t k to t k+ \ = t k + h k . 
In the next section, we will pay closer attention to all aspects of this strategy. 

Householder on [t k ,t k +i]. 



INPUT: t k , h k > 0, initial conditions pj; k l \t k ) 



i = 1, 



and or the vectors v[ k ^ at t k ), and the of X \ 



,p (i.e., the vectors u. 



(fe-i) 



p, check to see if ( |2.10 ) (or (2.11)) holds true. If it fails, rede 



(1) For i = 1, 

fine according to (|2.9|) and determine new P^' (t k 



accordingly (redefine u[ k \t k ) or v 4 w (£&)), for i 
• For i = 1 , . . . , p 



Q (k \t k ) 
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(2) Let A = A(i : n,i : n) 

(3) Find the Householder transformation (t) by integrating (2.7) or (2 
on [t k ,t k+1 ] 



(4) Do an {A, Pi) update fl2.6| ) 
• Endfor i. 

OUTPUT: Q( fe )(i fc+1 ) T = Pj, k) (t k+1 ) ■ • •P 1 (fc) (i fe+1 ), is such that Q^(t k+1 ) T X(t k+1 ) is 
triangular. 



2.2. Givens transformations. Suppose at tk we know X(tk) (e.g., = to). To 
find Q(t) such that Q T (t)X(t) = R(t), for i > i fe , we look for Q(t) = Q x {t) ■ 

'h-x 

Gi{t) 

product of elementary planar rotations (Givens, or Jacobi, transformations) of the 
type 



where Qi(t) is of the form Qi(t) = 



, and each Gi , i = 1 , . 



?p(*)> 
, p, is the 



Qii(*) = 

for j = 2, . 



I-(eieJ + e j eJ) + G i 



Gi 



c l3 (eie{ + ejej) 



,(eie 



e,e 



i + 1. Above, we have used and to express cos(#ij(i)) and 



= sin(^ij(i)), respectively, where the function 9ij(t) needs to be found. Now, 
suppose we have triangularized the first i — 1 columns of X, and still call X the 
transformed matrix. The role of Gi is to triangularize xi := X(i : n, i), the i-th 
column of the unreduced part of X. 



In the standard linear algebra setting (sec GVL |), the rotators are safely applied 
in their natural sequence Qi^+i, . . . , Qi. n - But this may lead to instabilities in our 
time dependent setting! In fact, the specification of the order in which the rota- 
tors Qi } i+i, . . . , Qi t n are applied turns out to be of utmost importance for numerical 
stability, and therefore for accuracy and efficiency. We adopted the following strategy. 
Order of rotators. Our strategy is 

(2.12) First rotator must annihilate largest entry of Xi(2 : n — i + 1) . 

To be precise, define / to be the largest entry in absolute value of Xi(2 : n — i + 1): 

(2.13) I : X i+ i_ii= max |X i+3 _ M | , 

2<j<n-i+l J 

and define the index array 7Tj as 

(2.14) 7Ti = [1, 1, 2, . . . , I - 1, 1 + 1, . . . , n - i + 1] . 
Then, define (the ordering of the rotators) Gi as 

(2-15) Gi — Qj )7ri (2) • " • Qi )7 ri(n-i+l) ■ 



Remark 2.2. There seems to be no need to further refine ( 2.1 2| ) by selecting the 
second rotat or to annihilate the largest entry of the unreduced part, and so forth. 

In | DV2 1 , we derived differential equations for the elementary rotators, that is for 
the 9ij or for the corresponding (cos, sin) pairs. To recall, omitting the row index i 
(i.e., using 8j for 0y, etc.), these are 

d , 



(2.16) 



C 7Ti(3) ' ' ' c iTi{n-i+l) 3t t7 3r i (2) 
c 7Ti(4) ' ' ' Cm(n-i+l) S^i(3) 



^"Ki (n— dt ^-Ki(n— i) 
d 













^7Ti (n — i) 




_ ®-7Ti (n — 2+1) _ 
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Here, for j = 2, . . . , n — i + 1, we have set 

a 7r,(j)W = e wi(j) [Qf,TTi(n-i+l) ' ' • < 9i^r 1 (2)^ ( 2i,M2) " ' ' Qi,-Ki{n-i+l)\ e l - 

and A is really A(i : n, i : n), which has been progressively modified by the accumu- 
lated transformations: 



(2.1f#,Qj)-update : A(t) := Q 3 {t) T A(t)Q {t) - Qj(t)Qj(t) , j = l 



,i- 1 . 



Of course, since ^ 



cos(%) 
sin(%) 



cos(%) 



from (2.16) it is trivial to write 



differential equations for the (cos, sin) pairs directly: 



-7Ti(3) • • ' <-V;(ri-i+l) di 



C ^(2) 

S 7T,(2) 



,1 



7Ti (n- 
TTi (n- 



-^(2) 

a Wj (2) 



C ^i(2) 

S 7Tj(2) 



■^7Ti (n— ' 



7Ti(n— 



^7Ti (n—i) 

o 





7Ti (n—i) 
7Ti (n — i) 

^7Ti (n — i+1) 



(2.18) 

To suppl y ini tial conditions a t tp fo r t he di fferential equations ( [2.16j ) and/or ( [2.18] ), 
we enforce ( |2 .12 ); that is, use ( 2.13 ), ( 2.14 ), and apply the rotators in the order 
specified by ( [2.15| ). At each application of an elementary rotator on Xi at to, there is 
a sign ambiguity; we resolve this ambiguity by forcing the first entry of Xi to be always 
positive as the rotators are applied (and thus, equal to \\xi\\ after all the elementary 
rotators Qi )7ri (2)) ■ ■ ■ , Qi.^fn-i+i) , are applied). This way, we can start the integration. 

But, to describe the typical step be tween tk an d tk+ i = tk + hk, the strategy just 
outlined, in particular the expressions ( 2.1 3| ) and (2.14), must be modified, since we 
do not have X at tk- To understand the way we do this, we first need the following 
remark. 

Remark 2.3. Givens transformation provide another example of the situation 
we described at the beginning of this work, when we discussed how a trajectory 
on a smooth manifold can be parametrized by overlapping local charts. Then, we 
said that an appropriate way to choose local coordinates was to enforce numerical 
stability. We now justify how this can be achieved with the choices we have adopted. 
Without loss of generality, assume that 7T; = [1, 2, . . . , n — i + 1] (if not, relabel the 



indices accordingly). From the differential equations (2.16) and (2.18), we notice that 
multiplication by the inverse of the diagonal matrix 



diag(c 3 • 



-•t+li C4 • ■ 



1 



is taking place. For numerical stability, we need to avoid division by small num- 
bers. Clearly, the smallest number by which we are dividing (in absolute sense) is 
Let Xi j denote the jth component of Xi, j — 1,2, ... ,?i - i + 1. Then, 



c 3 • 



using rotators to triangularize Xi, we have 



(n 



01 



3 

1 = 1 ■ 



'i.U 
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and so 



Now, as long as 
(2.19) 4j 



( X i,l + x 1.2) 



' i,2 



> X 



hi 



3 = 3, 



1,1 = 1, 



■P ■ 



no division by small number is taking place, since when (2.19) is satisfied we have 

22 ^1 

c 3 ' ' ' c n-i+l £- 



(ra-i) ' 



which is as nicely bounded away from as any product of cosines for Givens' transfor- 
mations can generally be. Furthermore, with some simple algebra, one can see that: 
The inequality ( |2.IS|) is equivalent to having 



(2.20) 



n 

J=3 



2^2 
Cj > S 



k • 



, ,p. 



Clearly, ( ^20|) can be used to decide if the current ordering of Givens' transformations 
is numerically stable or not, without knowledge of X. 



We are ready to describe how we modify the s trate gy which led us to ( 2.13 ) and 
(2.14) after the first step. First of all, as long as ( [2.20] ) holds, we do not change the 
present ordering of the rotators. In case (2.20) fails, we enforce (2.12), but by only 
keeping track of the transformations, in the following way. Suppose we have found 



the Givens transformation matrices at tk, coming from tk-i, call them Q\ k (tk) 



Call Qf\t k ) 



the possibly different initial condition for the new 



h-i 

Givens matrices (that is, different 0's or cosines/sines) we need in order to step past 
tk- Define = I ni and inductively define 



(2.21) 



-K^Gf-^^ei, far i = l,. 



Let I be the index of the largest entry (in absolute value) of iWj(2 : n — i + 1). 
Accordingly, define iTi as in (2.14) and the ordering for the rotators relative to G\ (tk) 
as in ( 2.15| ). Find initial conditions for the Q[ k j(tk) by enforcing that all vectors 



n 



n — rn-\-l , 
J=2 



U) 

c 



1, . . . , i, have positive first component. This 



way, we define G\ k \tk), hence Q^\tk). Finally, for i = 2, 



,p, we define k9j 1 G 



Qf\{tuf 



h-2 





K 



(fc) 

i-2 



Ql\ 1] (t k ) 



h-i 
o 



K 



(fc) 



Again we omit the proof of the following Lemma, since it is a direct verification. 
Lemma 2.2. The choice just described for pro viding initial conditions on the 
rotato rs at tk is equivalent to the strategy based on ( 2.15 ) and first paragraph after 
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We summarize the overall strategy on a typical step from t k to t k+ i = t k + ftfc. 
We assume that integration has been successful up to tfc, and that the step h k has 
been selected. 

Givens on [t k ,t k+1 ]. 

INPUT: tf., hk > 0, initial conditions Q\ k ~ 1 \tk) , i — 1, . . . ,p (i.e., either the (cos, sin) 
pairs or the 9 values, and the ordering in which they had been applied). 



(1) For i = 1, . . . ,p, check to see if (2.2C) holds true. If it fails, redefine the index 



array Hi and initial conditions for the rotators at t k . That is, for i = 1, . . . ,p, 
let Q\ k ' — QiTr t t2) ■ ■ ■ Qi,TTi(n+i-i), and find initial conditions for Q\ k \tk) by 
bringing Wi(i : n) into ei so that all rotated vectors Y\^Z™ +1 {Q^ldj) (tk)) T Wi, 
m = n — 1, . . . , i, have positive first component. 
For i = 1 , . . . , p 

(2) Let A = A(i : n,i : n). 

(3) Find the transformation Q[ k \t) by integrating the differential equations 



( p6b or (|I1) on [t k ,t k+1 



(4) Do an (A,Qi) update (p7| ). 
• Endfor i. 

OUTPUT: Q^(t k+1 ) = Q^fe+i) ■ • -Q^ife+i), is such that [QW (t k+1 )) T X (t k+1 ) is 
triangular with positive diagonal entries. 

Remark 2.4. We observe that -even through a change of initial conditions- with 



our strategy the signs on the diagonal of R, see (1.1), remain fixed when using rotators 
to find Q. 

3. Implementation. Here we describe how we implemented the algorithm put 
forward in the previous section. Before doing so, however, we want to derive a dif- 
ferent formulation for the Householder transformations, which present some distinct 
advantages over both the u and v formulations. The motivation comes from Remark 



2.1 



We experimented with two different rewritings of the Householder vectors. 



(a) With the notation of (2.8), we observe that (efv) = a(l — iFv) 1 / 2 . Thus, one 
can write differential equations just for v, and then recover all of v by using 
(efv) = o^ — iFv) 1 / 2 . On the surface, this seems to present some advantages, 
since there is one less differential equation to solve. However, there is a 
potential loss of precision which occurs with this approach when forming 
back ejv. After all, we know that the first component of v dominates all 
others; thus, we expect that some of the other components will be small (and 
we often observed this to be true in practice). Indeed, in our experiments, 
this formulation gave consistently less precise results than the v formulation 
or the w formulation to be introduced next. For this reason, henceforth we 
will not present computational results obtained with this formulation. 

(b) Consider the new variable w: 



(3.1) w = -Tp- v, thus w = 



e{ v 



Observe that 



—a 
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so that the discussion relative to the choice of cr's (see ( |2.9| )) stays unchanged. 

we have one less differential equation to solve and 



Since w = % = 



a simplified form for the (A, Pj)-updates. Omitting the indices for simplicity, 
the form of the typical update becomes 



(3.2) 



PAP - PP 



A 



-(w{w T A) + (Aw)w T ) + 4 



w Aw 
(w T w) 



-ww 



-(WW —WW ) 



From ( |3.2| ), it is easy to derive the differential equation satisfied by w. In fact, 
this can be obtained from the requirement that (PAP — PP)e\ has only its 
first entry not 0. Using the form of it?, and the same notation used to derive 
(2.8), this requirement becomes 



-((an + w T ai)w + &i + Aw) + 4 



ai - - 

from which we get the equation for w: 

/o o\ dw r * n w T Aw-, „ 

(3.3) — = [au+w ai-2 — = — \w ■ 
at w 1 w 



w Aw 
(w T w) 2 



-w = 0, 



(1 



T 

w w 



)&i + Aw . 



Remark 3.1. Notice that the division by w T w in (3.3) is perfectly safe, given the 
form of w. Of course, if one uses the w-variables, obvious modifications take place in 
the skeleton of the algorithm on pageS. E.g., (2.10) now would read 



(3.4) 



+ ••• + <*) > o. 



To sum up, for Householder methods, we have considered three possibilities: (i) it- 
variables, (ii) v- variables, and (iii) w- variables. In the f- variables, we need to integrate 



(2.8) whose solution has norm 1 for all t, and we must maintain this property under 
discretization, at gridpoints. In principle, we could use Gauss RK schemes for this 
task; we did this in [DV2], by using Newton's method to solve the resulting nonlinear 
system, but this gives a cost of 0(pn 3 ) per step which is more than what we are willing 
to pay. Use of the linearly convergent scheme of [DRV1] forces a severe stepsize 
restriction, which is clearly incompatible with adaptive time stepping. For these 
reasons, for the v- variables, in this work we propose integrating ( |2.8| ), with an explicit 
scheme followed by a renormalization at each step. The resulting method is a "local 
projection" method for the w-variables and has cost of 0(n 2 p) per step. As far as 
the integration for the u or u>- variables, integration can be carried out with explicit 
schemes, with no need of renormalization. Notice, though, that when forming the 
Householder transformations, and hence Q, one is in fact normalizing things so to keep 
them orthogonal; this is obvious: in the matrix I — 2ww T /w T w, the term ww T /w T w 
is indeed an orthogonal projection. 

As far as methods based on Givens transformations, so far, we spent more time 
t rying to obtain good codes only for the formulation in terms of the 9- variables, see 
( 2.16 ), and not for the (cos, sin)-variables. There are several reasons for our choice. 
First, a simplicity reason: it is simpler to work with the 9- variables, and t hen form the 
rotators. Second, if we work with (cos, sin), then we must integrate (2.18) maintaining 
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the solution of norm 1: again, we could use Gauss RK schemes with Newton's method, 
or a "local projection" technique. But, in all cases, ( 2.18| ) is twice the dimension of the 
system to be solved with the (^-variables. Repeated use of the trigonometric identity 
cos 2 a + sin 2 a = 1 would allow us to half the dimensions, of course, but at the price 
of added nonlinearities. Third, there is an accuracy reason to prefer the ^-variables; 
in all our tests, they gave more accurate results than their (cos, sin) counterpart. An 
explanation for this fact is the content of the next Lemma. 

Lemma 3.1. Let 9 be the angle associated to the elementary rotation Q{9) = 
~cos(6>) -sin(0)~ 
sin(0) cos(6>) 

rotator associated to <f>. Consider the error matrix E := Q T {9)(Q(4>) — Q(9)) 
6 — <j) = n, sufficiently small, then the error on the diagonal of E is 0(n 2 ): 



Let <j) be an approximation to 9, and let Q((f>) be the elementary 

If 



E = 



+ 0(r] 4 ) ri + 0{n 3 ) 
-n + 0{^) -^ + 0(ry 4 ) 



2 



Proof. The proof follows from Q T {9)Q(4>) 



cos(# 
— sin( 



cos( 



□ 



3.1 



In particular, this Lemma implies that = |r/| + 0(n 2 ). Instead, an error 
equal to n on cos(#) — cos(0) and on sin(#) — sin(0), would have rendered \\E 
|?/|\/2 + 0(i] 2 ). In the general case of Q S H" xp , with notation similar to Lemma 
by taking into account the general form of the matrix Q as product of rotators, and 
using Lemma 3.1 over and over, it is simple to realize that an error of order rj on the 
angles still gives an 0(n 2 ) error term on the diagonal of E, whereas an error 0(r/) for 
the cos and sin would give an 0(r/) error term for all entries of E. 

Remark 3.2. Of course, the ^-variables are more properly seen as variables on 
a torus. Indeed, also to avoid pathological cases in which the 9 values would grow 
unbounded, we always renormalizc their values to [—it, 7t], which we do by computing 
inverse tangent. This represents a drawback with respect to working directly with the 
(cos, sin) pairs. 

Remark 3.3. Householder methods based on the w- variables, and Givens' methods 
based on the 9- variables parametrize the sought Q using exactly p 2 "~ 2 p ~ 1 parameters: 
the minimal number required. 



Linear Algebra Involved. An advantage of using Householder and Givens transforma- 
tions to represent Q is that we can take advantage of the many clever ways in which 
these transformations can be manipulated, see [GVL|, and thus obtain efficient proce- 
dures. For example, we keep the matrix Q in factored form, and never form it (except 
when required for output purposes). Of course, we never perform matrix-matrix mul- 
tiplications either, but we exploit inner product arithmetic for Householder matrices 
and the simple structure of the rotators for Givens matrices. 

Discretization Schemes. We have chosen to use explicit integrators of RK type as the 
basic schemes. We are not interested in extremely accurate computations; the range of 
practical interest for us is between 10~ 2 and 10 -10 , and our schemes have been chosen 
with this accuracy demands in mind. If one needs more accurate computation, then 
different choices would be more appropriate. We chose formulas of order 4 and of 
order 5 with an associated embedded formula, to be used in variable stepsize mode. 
The formulas used are the well known Runge-Kutta 3/8-th rule, a scheme of order 4 
with an embedded scheme of order 3, and the formulas of Dormand-Prince of order 
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5 with the embedded scheme of order 4. For convenience, we give these pairs below. 
The lower order scheme is the one relative to the weights b. 





5 

3 

1 



12 



3/8-th Runge-Kutta 4-3 pair 



19^2 

m 



_56 
25^60 

m 

_33 





32 

64148 



384 () 



m — m 



3El 



too 

1113 







212 
J? 9 



5179 
57600 



bW 
1113 



217 
102 








5103 

486 



7571 
16695 



T25~ 

192 



& 



393 
640 



"2TS7 

6784 



_S4_ 



92097 187 J_ 
339200 2100 40 





Dormand-Prince 5-4 pair. 

Remark 3.4. With our schemes based on elementary transformations, one can 
form orthogonal approximations also at the internal RK points, not just at the grid 
points, at negligible extra cost. This is a nice fact, both for dense output purposes, 
and for forming A at the internal points. 

Updates. Whenever the updates are required, see (^£) and ( [2.17 ), we do not setup 
the updated matrices by using exact derivatives (as we did in [DV2]), since we can 
(and do) use the values obtained at the Runge-Kutta stages; this way, the update 
does not require extra function evaluations. 

Error Control. Finally, about error control for variable time-stepping integration. 
Most of the criteria we used are standard. We do mixed absolute/relative error 
control with respect to the input value T0L, in the way explained in [HNW, II-4], with 
some slightly different heuristics. For example we use a safety factor of 0.8 (it is f ac 
in [HNW]), we never allow a new stepsize to be bigger than four times the current 
stcpsize, and we choose the initial stepsize to be TOL 1 /^ 1 ), where q is the order of 
the estimator (i.e., q = 3 for the embedded 3/8th rule, and q = 4 for the embedded 
Dormand-Prince rule). But the most relevant changes to the standard strategies are 
due to the nature of our methods. In fact, triangularizing one column of X at the 
time, as we do, present some important advantages. First of all, we decide on step- 
size changes by monitoring the behavior of the error (in the oo-norm) on all columns 
independently, and then take the most conservative estimate for the next step. But, 
most importantly, there is no need to complete the entire integration step prior to 
rejecting the step! This is a pleasant outcome of the present implementation, since 
often (see Section |]) a step failure occurs ahead of having completed computation of 
all of Q. 



Reimbedding. We have used the c onstr uctio n bas ed o n (2.9)_ L or ( 2.21 ), and following 
discussion, only if the tests (2.11), or (2.10), or (3.4), or (2.2C), for the u, v, w, or 9 
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variables, respectively, failed. We have not succeeded in finding less expensive ways 
to obtain new initial conditions in case these tests failed. 



4. Codes &; Examples. We have written FORTRAN codes using the previous 
ideas. In the examples, we will refer to the performance of each of these codes, 
which differ by which formulas are used, and whether or not they are implemented 
in constant or variable stepsize modes. We use the following first letter convention: 
u=u- variables, v—v- variables, w=w-variables, t=9- variables. 

• Fixed stepsize codes. 

— 3/8th rule: urk38, vrk38, wrk38, trk38. Thus, for example, wrk38 
is a fixed stepsize implementation using the Runge-Kutta 3/8th rule of 
the Householder method based on the w-variables. 

— Dormand-Prince rule: udp5, vdp5, wdp5, tdp5. 

• Variable stepsize codes. The naming convention is as above, but now the first 
letter is a v to signify "variable stepsize" . 

— 3/8th pair: vurk38, vvrk38, vwrk38, vtrk38. For example, vvrk38 
is the variable stepsize implementation of the 3/8th pair for the House- 
holder method in w-variables. 

— Dormand-Prince pair: vudp5, vvdp5, vwdp5, vtdp5. 

We report on several measures of performance of the above codes. 

- err: the error between computed and exact Q, if the exact Q is known. 

- reimb: the number of reimbeddings needed for a given method; we increment 
the counter every time a change of coordinates is performed. 

- rejs/first: the number of total rejections (in variable stepsize mode) fol- 
lowed by the rejections occurring while triangularizing the first column. 

- cpu: the total CPU time needed to complete a given run normalized to 1 for 
the fastest run on the given problem. 

- nsteps: the total number of steps taken (in variable stepsize mode). 

For the variable stepsize codes, we also include comparison with a projected in- 



tegrator, prk45, which integrates (1.2) with the well known (and sophisticated) in- 
tegrator RKF45 of Netlib, and then uses modified Gram-Schmidt for the projection. 
We recall that RKF45 is a RK solver of order 5/4 whose performance is comparable 
with the Dormand-Prince 5/4 pair we adopted here. 

Example 4.1. This is a problem chosen because the underlying fundamental so- 
lution is both exponentially dichotomic and fast rotating. Bad projection methods 
have difficulties on this problem. Also Gauss RK schemes without Newton method 
run into serious stepsize restrictions. We have the coefficient matrix 

A(t) 



(3 cos(2crf) —a + (3 sin(2at) 
a + {3sm(2at) — /3cos(2ai) 



and we seek Q associated to the QR factorization of X : X' = AX, X(0) = I. 



The exact solution is X(t) 





-e"' 3 ' 



We fix (3 = 100, 



cos(at) sin(at) 
sin(crf) — cos(at) 

a = 100, and consider integration on the interval [0, b] with b = 10. The problem 
should cause difficulties to methods based on Householder transformations, because of 
the fast rotation of Q. This does indeed produce several reimbeddings for Householder 
methods, but no appreciable deterioration in accuracy. 
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Observe that for the t methods there is only one equation to integrate: 

8 = a- (3 sin(20(t)) cos(2crf) + (3 sin(2ai) cos(20(») , 0(0) = , 

and this has the exact solution 9(t) = at. So, one may expect no error while integrat- 
ing for 9. However, the reason why the t methods do not recover the exact solution 
is because we automatically renormalize angles to [— tt, tt] and this causes roundoff 
errors to enter in the picture; if we do not renormalize the angles, then the exact 
solution is recovered. 

Tables 4.1 and 4.2 summarize the results of our numerical experiments. For the 
fixed step methods in Table 4.1 the small error for the t methods is notable and this is 
mirrored by the superior performance for the variable step t methods (see Table 4.2). 
Note that the u methods fail on this problem. Also, observe how prk45 is considerably 
more expensive than the competing 5/4 codes (vvdp5, vwdp5). 



Table 4.1. Example 4.1: fixed stcpsize At = l.E - 3. 


Meth 


err 


reimb 


cpu 


tdp5 


3.1E- 13 





17.5 


trk38 


3.9E- 13 





14.2 


udp5 








urk38 








vdp5 


2.5E-9 


318 


24.0 


vrk38 


1.6E-6 


318 


18.9 


wdp5 


3.9E-8 


318 


18.9 


wrk38 


2AE - 6 


318 


15.5 



Table 4.2. Example 4.1: variable stepsize, tol= l.E — 8. 


Meth 


err 


reimb 


rejs 


cpu 


nsteps 


prk45 


1AE-8 






31.1 


20803 


vtdp5 


i.6E - 8 





162 


1.2 


599 


vtrk38 


2.bE- 8 





158 


1 


705 


vudp5 












vurk38 












vvdp5 


3.3E- 9 


318 





20.4 


9557 


vvrk38 


7.2E - 9 


318 


1 


61.4 


37931 


vwdp5 


3.0E - 9 


318 


718 


20.4 


11623 


vwrk38 


4.6£ - 9 


318 


1835 


46.0 


34317 



Example 4.2. Here we take the coefficients matrix 
A{t) = a(9(t) - sin(t)) 



1 
-1 



where 9(t) 



-(exp(— at) + asin(i) — cos(t)) and we fixed a = 100. Interval 



This 



cos(0(i))) sin(0(t)) 
-sin(0(i)) cos(6>(£)) 

is a difficult problem for all the methods considered, but all methods except prk45 



of integration is [0, 10] and exact solution is Q(t) 
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obtain accurate solutions as seen in Tables 4.3 and 4.4. The integrator RKF45 fails to 
integrate in one-step mode on the desired interval, though it succeeds on [0,9], much 
less efficiently than vtdp5, vvdp5, vwdp5. We expected the 9 methods to run into 
difficulty in variable stepsize mode, because of stiffness of the 9 differential equation, 
but did not notice that. 



Table 4.3. Example 4.2: fixed stepsize At = l.E- 3. 


Meth 


err 


reimb 


cpu 


tdp5 


1.5£J- 12 





169. 


trk38 


l.hE- 10 





136. 


vdp5 


&.2E- 12 





209. 


vrk38 


1.5£J- 10 





170. 


wdp5 


6.2E- 12 





163. 


wrk38 


1.6J5- 10 





131. 



Table 4.4. Example 4.2: variable stepsize, tol= l.E — 8. 


Meth 


err 


reimb 


rejs 


cpu 


nsteps 


prk45 












vtdp5 


5.3E- 9 





8 


1 


53 


vtrk38 


5.LE-9 





8 


3.0 


206 


vudp5 


l.BE-8 





10 


2.0 


93 


vurk38 


2.6E - 8 





1 


4.0 


280 


vvdp5 


6.9E- 9 





10 


2.9 


106 


vvrk38 


5.9E - 8 





8 


5.0 


263 


vwdp5 


1.3E-8 





12 


1.2 


66 


vwrk38 


6AE - 9 





20 


3.1 


238 



Example 4.3. This is a coefficients' matrix arising from a stiff two-point boundary 
value problem having both boundary and interior layers. We have 



A(t) = 



■ 

f/(2e) 

. 






1 








1 


1/2 











1/e 


1/e 


-t/(2c) 



and we take e = 10~ 2 . Interval of integration is [—1, 1]. Exact solution is not known. 
The u-methods fail to complete the integration, but all other methods perform well. 
For this problem, see Tables 4.6-(i) and 4.6(ii), most rejections occur after having 
computed all of Q. 
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Table 4.5. Example 4.2: fixed stepsize At = l.E - 3. 


Meth 


r/eimb 


cpu 


tdp5 


2 


8.7 


trk38 


2 


6.2 


vdp5 


3 


12.2 


vrk38 


3 


8.8 


wdp5 


3 


10.0 


wrk38 


3 


7.2 



Table 4.6-(i). Example |4.3|: variable stepsize, tol= l.E — 8. 


Meth 


reimbr 


ej s/f irs 1 


; cpu 


nsteps 


prk45 






1.4 


252 


vtdp5 


2 


11/2 


1 


221 


vtrk38 


2 


15/4 


1.7 


628 


vudp5 










vurk38 










vvdp5 


3 


9/1 


1.3 


217 


vvrk38 


3 


14/2 


2.7 


612 


vwdp5 


3 


10/2 


1.1 


228 


vwrk38 


3 


13/3 


2.3 


649 



Table 4.6-(ii). Example 4.3: column-rejections, tol= l.E — 8. 


Meth 


1st 


2nd 


3rd 


vtdp5 


2 


1 


8 


vtrk38 


4 





11 


vvdp5 


1 





8 


vvrk38 


2 





12 


vwdp5 


2 


1 


7 


vwrk38 


3 





10 



Example 4.4. This is a fairly simple problem quite useful for testing purposes. 
We have the coefficient matrix 



A(t) = Q(t)D(t)Q T (t) + Q(t)Q T (t) , 



where D(t) = diag(l, cos(t), -5^, -10), Q(t) - 

We report on results for a = 1 and (3 = y/2 and 





Qp(t) 





Q a (t) 

Qa(t) 



and Q~[{t) 



cos(7<) sin (7*) 
— sin(7t) cos(7t) 

integrate on [0, 100]. Notice the inadequacy of the u- variables. Such inadequacy is 
hidden in constant stepsize mode, but it becomes apparent in variable stepsize mode: 
the it- vector becomes very poorly scaled. In variable stepsize, nearly all rejections 
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occur while triangularizing the first column. From Table 4.8, observe how prk45 
takes more steps and is less accurate than vtdp5, vvdp5, vwdp5. 



Table 4.7. Example 4.4: fixed stepsize At = l.E - 3. 


Metri 


err 


reimb 


cpu 


tdp5 


1.6E- 10 


27 


22.7 


trk38 


1.5E- 10 


27 


16.9 


vdp5 


1.6E- 10 


77 


24.3 


vrk38 


1.5E- 10 


77 


19.1 


wdp5 


l.QE- 10 


77 


22.2 


wrk38 


1.5E- 10 


77 


17.3 



Table 4.8. Example 4.4: variable stepsize, tol= l.E — 8. 


Metri 


err 


reimb 


rej s/f irst 


cpu 


nsteps 


prk45 


2.1E-7 






1.1 


5053 


vtdp5 


7.7E-9 


27 


94/89 


1 


4533 


vtrk38 


1.2E-8 


27 


107/100 


2.1 


13010 


vudp5 












vurk38 












vvdp5 


1.2E-8 


77 


80/80 


1.1 


3967 


vvrk38 


1.2E-8 


77 


99/99 


2.1 


11169 


vwdp5 


1.4E-8 


77 


103/92 


1.1 


4370 


vwrk38 


2.8E- 8 


77 


125/115 


2.3 


12694 



Example 4.5. Here we take the diagonal coefficient matrix 
A{t) = diag(- -10, cos(t), 1) , 



which is just a permutation of D(t) in Example 4.4. We take both Q(0) = I and 
a random initial condition Q(0), to verify if the diagonal elements of D will become 
reordered. When Q(Q) = I, then the computed solution remains Q(t) — I for all t. 
For randomly chosen Q(0), the change of variables Q does instead reorder the diagonal 
of A: An > • • • > A44 (see Figure 4.1; results obtained with vtdp5). 

Example 4.6. This is an example of a constant coefficients matrix. The equation 



(1.2) reduces to a so-called isospectral flow. For these problems, it is known that the 
diagonal of the upper (p, p) block of the transformed matrix must eventually converge 
to the real parts of the leading eigenvalues of A. Far from advocating the ideas set forth 
in this paper as a mean to solve the eigenvalue problem, we included this problem 
because we wanted to highlight (on a problem of arbitrarily high dimensions) how most 
stepsize rejections occur ahead of having found all of Q; for this reason, we report 
on results obtained with the variable stepsize codes vtdp5, vvdp5, vwdp5, and, for 
comparison, with prkf45, in spite of the fact that probably constant coefficients 
problems can be efficiently solved also with constant stepsize. Also, we wanted to see 
how ill conditioning (i.e., poor separation) of the eigenvalues affected convergence. 
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100 




1e-12 



1e-14 



1e-16 - 



1e-18 



"tol=1.E-8" 
"tol=1.E-4" 
"tol=1.E-2" 




50 



100 150 200 250 300 350 



400 



Fig. 4.1. Plot of t vs. log 10 (||D(t) — diag(A(t))||oo) for vtdp5 applied to Example 4-.t for 
different tolerances using a "random" Q(0). 



The coefficient matrix is the upper Hessenberg Frank matrix: 



A = 



I n 
n — 1 


V o 



71 — 1 

n — 1 
n-2 



n-2 
n-2 
n-2 



We take n = 25, and seek Q triangularizing X £ M nxp , X' = AX, X(Q) 



... 1\ 
... 1 
... 1 

l' lJ 

nxp 



with p — 13. Interval of integration is [0, b]. Exact solution is not known, so we report 
on the defect of the diagonal of the upper (p,p) block of the transformed matrix A(b) 
from the real parts of the leading eigenvalues; this we list as errd. At four digits, the 
eigenvalues are 77.9837, 60.5984, 47.7777, 37.5667, 29.2021, 22.2856, 16.5772, 11.9193, 
8.2006, 5.3359, 3.2479, 1.8495, 1.1841 ± 0.2634i, 0.8147 ± 0.6124i, 0.4098 ± 0.7755*, 
0.0051 ± 0.7737i, -0.3373 ± 0.603H, -0.5452 ± 0.3177i, -0.6070. 
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In the next table, we report on selected runs with the variable stepsize codes 
for b = 100. The value of errd is entirely due to lack of convergence to the real 
part of the 13-th eigenvalue, 1.1841, all larger eigenvalues having already converged 
to several digits. Notice that (for most cases) errd does not change appreciably by 
decreasing the tolerance value, which indicates that only by increasing the length 
of integration one would be able to approximate the 13th eigenvalue more accu- 
rately. Also, we notice that our codes are clearly superior to prkf45. Finally, in 
our codes, a large majority of rejections occur while integrating for the 4th or 5th 
columns; for example, for vtdp5 with tol=l.E-6, the rejection patterns is as fol- 
lows: [1, 0, 11, 372, 104, 4, 0, 0, 1, 14, 1, 4, 3], where each number in this vector refers to 
rejections occurred while triangularizing columns 1 to 13. 



Table 4.9. Example |4. 6 


: variable stepsize. 


Meth 


tol 


errd 


reimb 


rejs 


cpu 


nsteps 


prk45 


l.E-4: 


1.8E-1 






1.7 


5365 


vtdp5 


l.E-4 


9.6E-2 


73 


525 


1.5 


2391 


vvdp5 


l.E-4 


1.LE0 


77 


516 


1.2 


2459 


vwdp5 


l.E-4 


3.0E0 


85 


504 


1 


2462 


prk45 


l.E-6 


1.8E-1 






1.7 


5430 


vtdp5 


l.E-6 


1.8E-1 


65 


515 


1.6 


2459 


vvdp5 


l.E-6 


5.6E-2 


35 


501 


1.2 


2491 


vwdp5 


l.E-6 


1.1E-1 


34 


476 


1. 


2481 



5. Conclusions. The purpose of this work has been manifold. We discussed gen- 
eral design choices people should have in mind when devising techniques for solving 
(1.2). We reinterpreted methods based on Householder and Givens transformations 
as methods which give a trajectory on the smooth manifold of orthonormal matri- 
ces, and use overlapping local charts to parameterize the manifold. We gave new 
formulations and new implementations of methods based on Householder and Givens 
transformations. Finally, we presented a suite of FORTRAN codes for solving (1.2) by 
our techniques. We believe that the methods put forward in this work, and their 
implementation as laid down here, are a very sensible way to solve ( |l.2| ), and should 
not be ignored by anyone interested in comparative performance of other techniques 
for solving ( \l.2\ ). In this spirit, we welcome other people to use our codes^. 

Our study showed that all design scopes we had set at the beginning are satisfied 
by appropriate implementation of our techniques. There is certainly room for improv- 
ing our codes, and we anticipate some work towards more efficient implementation, 
in particular insofar as data structure, memory requirement, and extensive use of the 
BLAS libraries. However, we believe that our implementations are now sophisticated 
enough that some conclusions and recommendations for future work can be given. 

1. Much as one should have expected, in variable stepsize the dp5 codes outper- 
form the rk38 codes. In constant stepsize, however, the rk38 codes are less 
expensive and this makes up for their reduced accuracy. 

2. A striking difference with the standard linear algebra situation occurs when 
using Householder methods. There, different representations of the House- 
holder vectors are chiefly a matter of storage efficiency. In our time de- 
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pendent setting, however, Householder methods in the it-variables should be 
discarded: they are prone to instabilities, and this is clearly betrayed in a vari- 
able stepsize implementation. Instead, in the w-variables, integrated with a 
local projection step, and - especially - the w-variables, Householder methods 
are robust and also handle well large problems. 

Givens methods based on the ^-variables revealed very accurate and effi- 
cient. However, the frequent need to compute sines and cosines and inverse 
trigonometric functions is a potential drawback with this approach. A care- 
ful implementation based on direct integration of the (cos, sin) values has not 
been carried out, but could be an interesting endeavor. 

The implementation in variable stepsize has proved very rewarding, especially 
since it confirmed our expectation that often most rejections occur far ahead 
of having computed all of Q. 

It would be interesting to implement our methods by exploiting the rewriting 



.it), and hence solving (|2.2|) always starting near 



This may prove 





beneficial for all methods, also those based on the u- variables. 
With the present level of implementation, and all things considered, the meth- 
ods based on 9 and w variables are probably the best, followed by the v- 
mcthods. 

The relative comparison with the projected integrator prk45 appears favor- 
able to our new codes. In particular, our codes generally require fewer steps, 
and are more accurate and less expensive. Moreover, the vtdp5, vvdp5, 
vwdp5 codes never failed to complete the integration; instead, prk45, in spite 
of the certainly more sophisticated implementation of the integrator RKF45, 
was occasionally unable to complete the integration. 



REFERENCES 



[BGGS] G. Benettin, G. Galgani, L. Giorgilli, J. M. Strelcyn, Lyapunov exponents for 
smooth dynamical systems and for Hamiltonian systems; a method for computing all 
of them. Part I: Theory. . . . Part II: Numerical applications, Meccanica 15 (1980), 
pp. 9-20,21-30. 

[BR] T. Bridges and S. Reich, Computing Lyapunov exponents on a Stiefel manifold, 

preprint, University of Surrey, (2000). 
[CIZ] M. P. Calvo, A. Iserles, and A. Zanna, Numerical solution of isospectral flows, 

Math. Comp. 66 (1997), pp. 1461-1486. 
[C] M.T. Chu, On the continuous realization of iterative processes, SIAM Review 30 (1988), 

375-387. 

[CS] T.F. Coleman and D.C. Sorensen, A note on the computation of and orthonormal 

basis for the null space of a matrix, Mathematical Programming 29 (1984), 234—242. 
[Da] Davey, A., An Automatic Orthonormalization Method for Solving Stiff EVPs, J. 

Comp. Phys. 51 (1983), pp. 343-356. 
[DNT] P. Deift, T. Nanda, and C. Tomei, Ordinary differential eguations and the symmetric 

eigenvalue problem, SIAM J. Numer. Anal. 20 (1983), no. 1, 1-22. 
[DRV1] L. Dieci, R. D. Russell, E. S. Van Vleck, Unitary Integrators and Applications to 

Continuous Orthonormalization Techniques, SIAM J. Numer. Anal. 31 (1994), pp. 

261-281. 

[DRV2] L. Dieci, R. D. Russell, and E. S. Van Vleck, On the computation of Lyapunov 
exponents for continuous dynamical systems, SIAM J. Numer. Anal. 34 (1997), 
402-423. 

[DV1] L. Dieci and E. S. Van Vleck, Computation of a few Lyapunov exponents for contin- 

uous and discrete dynamical systems, Appld. Numer. Math. 17 (1995), 275-291. 

[DV2] L. Dieci AND E. Van Vleck, Computation of orthonormal factors for fundamental 

solution matrices, Numer. Math. 83 (1999), 599-620. 



ORTHONORMAL INTEGRATORS 



25 



[DV3] L. DlECl AND E. Van Vleck, Continuous orthonormalization for linear two-point 

boundary value problems revisited, IMA Volumes in Math. Appl. 118 (1999), 69-90. 
[DLP] F. Diele, L. Lopez, and R. Peluso, The Cayley transform in the numerical solution 

of unitary differential systems, Adv. Comp. Math. 8 (1998), 317—334. 
[ER] J. P. Eckmann, D. Ruelle, Ergodic theory of chaos and strange attractors, Rev. Mod. 

Phys. 57 (1985), 617-656. 
[GSO] I. Goldhirsch, P. L. Sulem, and S. A. Orszag, Stability and Lyapunov stability of 

dynamical systems: a differential approach and a numerical method, Physica D 27 

(1987), 311-337. 

[GVL] G. H. Golub and C. F. Van Loan, Matrix computations, 2nd ed., The Johns Hopkins 

University Press, 1989. 

[HL] E. Hairer, C. Lubich, The life-span of backward error analysis for numerical integra- 

tors, Numer. Math. 76 (1997), 441-462. 

[HNW] E. Hairer, S. P. Ncersett, and G. Wanner, Solving ordinary differential equations I, 
Springer- Vcrlag, Berlin-Heidelberg, 1993, Second edition. 

[H] D. HlGHAM, Time-stepping and preserving orthonormality, BIT 37 (1997), 24-36. 

[MR] V. Mehrmann and W. Rath, Numerical methods for the computation of analytic sin- 

gular value decompositions, Elec. Trans. Numer. Anal. 1 (1993), 72-88. 

[Me] Meyer, G. H. Continuous Orthonormalization for Boundary Value Problems, J. Comp. 

Phys. 62 (1986), pp. 248-262. 

[Mu] H. Munthe-Kaas, Runge-Kutta methods on Lie groups, BIT 38 (1998), 92-111. 



